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Abstract 

Given a directed acyclic graph G, and a set of values y on the vertices, the Isotonic Regression of 1 / is a vector 
X that respects the partial order described by G, and minimizes ||a; — y|| , for a specified norm. This paper gives 
improved algorithms for computing the Isotonic Regression for all weighted ^p-norms with rigorous performance 
guarantees. Our algorithms are quite practical, and variants of them can be implemented to run fast in practice. 


1 Introduction 

A directed acyclic graph (DAG) G{V, E) defines a partial order on V where u precedes v if there is a directed path 
from uto V. We say that a vector x € is isotonic (with respect to G) if it is a weakly order-preserving mapping of 
V into R. Let Zq denote the set of all x that are isotonic with respect to G. It is immediate that Xq can be equivalently 
defined as follows: 

Xq = {x & R'^ \xu<Xy for all (u, v) G E}. (1) 

Given a DAG G, and a norm H-H on R^, the Isotonic Regression of observations y G R'^, is given by a: G Xq that 
minimizes ||a; — y\\. 

Such monotonic relationships are fairly common in data. They allow one to impose only weak assumptions on the 
data, e.g. the typical height of a young girl child is an increasing function of her age, and the heights of her parents, 
rather than a more constrained parametric model. 

Isotonic Regression is an important shape-constrained nonparametric regression method that has been studied since 
the 1950’s|[I]|2l|3]. It has applications in diverse fields such as Operations Research ilia and Signal Processing 0. 
In Statistics, it has several applications (e.g. HUl), and the statistical properties of Isotonic Regression under the £ 2 - 
norm have been well studied, particularly over linear orderings (see El and references therein). More recently, Isotonic 
regression has found several applications in Learning ifTOl fTTl fl^ [T3l fT4l . It was used by Kalai and Sastry ifTOl to 
provably learn Generalized Linear Models and Single Index Models; and by Zadrozny and Elkan ifTSl . and Narasimhan 
and Agarwal HI towards constructing binary Class Probability Estimation models. 

*This research was partially supported by AFOSR Award FA9550-12-1-0175, NSF grant CCF-1111257, and a Simons Investigator Award to 
Daniel Spielman. 

^Code from this work is available at https : / /github. com/sachdevasushant/ Isotonic 
^Part of this work was done when this author was a graduate student at Yale University. 
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The most common norms of interest are weighted ^p-norms, defined as 




vev 


wf, 


IW^p 


max„gy Wy ■ \Zy\, 


p e [i,oo), 

p = QO, 


where Wy > 0 is the weight of a vertex v £ V. In this paper, we focus on algorithms for Isotonic Regression 
under weighted £p-norms. Such algorithms have been applied to large data-sets from Microarrays ESI, and from 

the web ESI Ell- 

Given a DAG G, and observations y £ , our regression problem can be expressed as the following convex 

program: 

min Her — y|l^ p such that for all (u, u) £-E. (2) 


1.1 Our Results 

Let \V\ = n, and \E\ = m. We’ll assume that G is connected, and hence m > n — 1. 

£p-norms, p < oo. We give a unified, optimization-based framework for algorithms that provably solve the Iso¬ 
tonic Regression problem forp £ [1, oo). The following is an informal statement of our main theorem (Theorem l3.1l l 
in this regard (assuming Wy are bounded by poly(n)). 

Theorem 1.1 (Informal). There is an algorithm that, given a DAG G, observations y, and b > 0, runs in time 
log^ n log ’^/s) , and computes an isotonic ccalg £ T-G such that 

I^alg - vWl^p < min ||x - y\\l + 5. 

The previous best time bounds were 0{nm log for p £ (1, oo) ESl and 0{nm -\- logn) forp = 1 El- 

foo-norms. For foo-norms, unlike £p-norms forp £ (l,oo), the Isotonic Regression problem need not have a 
unique solution. There are several specific solutions that have been studied in the literature (see ll20ll for a detailed 
discussion). In this paper, we show that some of them (Max, Min, and AVG to be precise) can be computed in time 
linear in the size of G. 

Theorem 1.2. There is an algorithm that, given a DAG G{V, E), a set of observations y £ R^, and weights w, runs 
in expected time 0{m), and computes an isotonic xinf £ Ig such that 

biNF - 2 /IL,oo = mm 11^ - 2 /IL.oo ■ 

xGXg 

Our algorithm achieves the best possible running time. This was not known even for linear or tree orders. The 
previous best running time was 0{m log n) ll20l . 

Strict Isotonic Regression. We also give improved algorithms for Strict Isotonic Regression. Given observations 
y, and weights w, its Strict Isotonic Regression ^strict is defined to be the limit of Xp as p goes to oo, where Xp is 
the Isotonic Regression for y under the norm IHI^ p . It is immediate that xsmet is an ioc Isotonic Regression for y. In 
addition, it is unique and satisfies several desirable properties (see ED)- 

Theorem 1.3. There is an algorithm that, given a DAG G{V, E), a set of observation y £ R^, and weights w, runs 
in expected time 0{mn), and computes xstrict , the strict Isotonic Regression of y. 

The previous best running time was rG) + nf log n) OTl . 

1.2 Detailed Comparison to Previous Results 

£p-norms, p < oo. There has been a lot of work for fast algorithms for special graph families, mostly for p = 1, 2 
(see ll22l for references). For some cases where G is very simple, e.g. a directed path (corresponding to linear orders), 
or a rooted, directed tree (corresponding to tree orders), several works give algorithms with running times of 0{n) or 
O(nlogn) (see 12^ for references). 
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Table 1: Comparison to previous best results for £p-norms, p ^ oo 



Previous best 

This paper 


£i £p,l < p < oo 

£p,p < oo 

d-dim vertex set, d > 3 

log*^ n lfT9l n lfT9l 

„1.5iogl-5(d-ri)^ 

arbitrary DAG 

nm + in? log n ifTSl nm log ^ ifTSll 

TTji-S log® n 


For sake of brevity, we have ignored the 0{-) notation implicit in the bounds, and o(log n) terms. The results are reported assuming 
an error parameter 6 — , and that are bounded by poly(n). 


Theorem 1 1.1 1 not only improves on the previously best known algorithms for general DAGs, but also on several 
algorithms for special graph families (see Table [T]i. One such setting is where is a point set in d-dimensions, and 
{u, v) G E whenever Ui < Vi for all i G [d]. This setting has applications to data analysis, as in the example given 
earlier, and has been studied extensively (see 12^ for references). For this case, it was proved by Stout (see Prop. 
2, ||23) that these partial orders can be embedded in a DAG with 0{n log'^”^ n) vertices and edges, and that this DAG 
can be computed in time linear in its size. The bounds then follow by combining this result with our theorem above. 

We obtain improved running times for all £p norms for DAGs with m = o(r? j log® n), and for d-dim point sets 
ford > 3. For d = 2, Stout B19II gives an 0{n log^ n) time algorithm. 

£oo-norms. For weighted £oo-norms on arbitrary DAGs, the previous best result was 0{m log n + n log^ n) due 
to Kaufman and Tamir ll24ll . A manuscript by Stout improves it to 0{m\ogn). These algorithms are based on 
parametric search, and are impractical. Our algorithm is simple, achieves the best possible running time, and only 
requires random sampling and topological sort. 

In a parallel independent work. Stout ||25]| gives 0(n)-time algorithms for linear order, trees, and d-grids, and an 
0{n log'^”^ n) algorithm for point sets in d-dimensions. Theorem ll.2l implies the linear-time algorithms immediately. 
The result for d-dimensional point sets follows after embedding the point sets into DAGs of size 0{n log'^”^ n), as for 
fp-norms. 

Strict Isotonic Regression. Strict Isotonic regression was introduced and studied in 121]. It also gave the only 
previous algorithm for computing it, that runs in time 0{mm{mn, n“) -I- n? logn). Theorem ll.3l is an improvement 
when m = o{n log n). 


1.3 Overview of the Techniques and Contribution 

£p-norms, p < oo. It is immediate that Isotonic Regression, as formulated in Equation (|2]i, is a convex programming 
problem. For weighted £p-norms with p < oo, applying generic convex-programming algorithms such as Interior 
Point methods to this formulation leads to algorithms that are quite slow. 

We obtain faster algorithms for Isotonic Regression by replacing the computationally intensive component of 
Interior Point methods, solving systems of linear equations, with approximate solves. This approach has been used to 
design fast algorithms for generalized flow problems IZ7l l28ll . 

We present a complete proof of an Interior Point method for a large class of convex programs that only requires 
approximate solves. Daitch and Spielman ll^ had proved such a result for linear programs. We extend this to ip- 
objectives, and provide an improved analysis that only requires linear solvers with a constant factor relative error 
bound, whereas the method from Daitch and Spielman required polynomially small error bounds. 

The linear systems in llZ7ll28ll are Symmetric Diagonally Dominant (SDD) matrices. The seminal work of Spiel¬ 
man and Teng Il29l gives near-linear time approximate solvers for such systems, and later research has improved these 
solvers further EollSI]. Daitch and Spielman 1^ extended these solvers to M-matrices (generalizations of SDD). 
The systems we need to solve are neither SDD, nor M-matrices. We develop fast solvers for this new class of matri¬ 
ces using fast SDD solvers. We stress that standard techniques for approximate inverse computation, e.g. Conjugate 
Gradient, are not sufficient for approximately solving our systems in near-linear time. These methods have at least a 
square root dependence on the condition number, which inevitably becomes huge in IPMs. 

foo-norms and Strict Isotonic Regression. Algorithms for foo-norms and Strict Isotonic Regression are based on 
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techniques presented in a recent paper of Kyng et al. 1^ . We reduce foo-norm Isotonic Regression to the following 
problem, referred to as Lipschitz learning on directed graphs in ll32l (see Section 0] for details) : We have a directed 
graph H, with edge lengths given by len. Given x € for every (m,u) £ E{H), define gradQ[a;](it,u) = 

max o|. Now, given y that assigns real values to a subset of V(H), the goal is to determine x £ 

that agrees with y and minimizes iaiax(^u,v)GE(H) gtadQ[a;](M, v). 

The above problem is solved in 0{m + n log n) time for general directed graphs in 1^ . We give a simple linear¬ 
time reduction to the above problem with the additional property that H is a DAG. For DAGs, their algorithm can be 
implemented to run in 0{m + n) time. 

It is proved in OTI that computing the Strict Isotonic Regression is equivalent to computing the isotonic vector 
that minimizes the error under the lexicographic ordering (see SectionlDi. Under the same reduction as in the £oo-case, 
we show that this is equivalent to minimizing grad^ under the lexicographic ordering. It is proved in ll^ that the 
lex-minimizer can be computed with basically n calls to ^oo-minimization, immediately implying our result. 

1.4 Further Applications 

The IPM framework that we introduce to design our algorithm for Isotonic Regression (IR), and the associated results, 
are very general, and can be applied as-is to other problems. As a concrete application, the algorithm of Kakade 
et al. |Il2] for provably learning Generalized Linear Models and Single Index Models learns 1-Lipschitz monotone 
functions on linear orders in O(n^) time (procedure LPAV). The structure of the associated convex program resembles 
IR. Our IPM results and solvers immediately imply an n}-^ time algorithm (up to log factors). 

Improved algorithms for IR (or for learning Lipschitz functions) on d-dimensional point sets could be applied 
towards learning d-dim multi-index models where the link-function is nondecreasing w.r.t. the natural ordering on 
d-variables, extending mniiia. They could also be applied towards constructing Class Probability Estimation (CPE) 
models from multiple classifiers, by finding a mapping from multiple classifier scores to a probabilistic estimate, 
extending Halil. 

Organization. We report experimental results in Section|2l An outline of the algorithms and analysis for £p-norms, 
p < oo, are presented in Section d In Section 1 we define the Lipschitz regression problem on DAGs, and give the 
reduction from £oo-norm Isotonic Regression. We defer a detailed description of the algorithms, and most proofs to 
the accompanying supplementary material. 

2 Experiments 

An important advantage of our algorithms is that they can be implemented quite efficiently. Our algorithms are based 
on what is known as a short-step method (see Chapter 11, lEl), that leads to an 0{^/m) bound on the number of 
iterations. Each iteration corresponds to one linear solve in the Hessian matrix. A variant, known as the long-step 
method (see 1^ ) typically require much fewer iterations, about log to, even though the only provable bound known 
is 0{m). 

Eor the important special case of £ 2 -Isotonic Regression, we have implemented our algorithm in Matlab, with long 
step barrier method, combined with our approximate solver for the linear systems involved. A number of heuristics rec¬ 
ommended in that greatly improve the running time in practice have also been incorporated. Despite the changes, 
our implementation is theoretically correct and also outputs an upper bound on the error by giving a feasible point to the 
dual program. Our implementation is available at https : //git hub . com/ sachdevasushant/ Isotonic 

In the figure, we plot average running times (with error bars denoting standard deviation) for £2 -Isotonic Regression 
on DAGs, where the underlying graphs are 2-d grid graphs and random regular graphs (of constant degree). The edges 
for 2-d grid graphs are all oriented towards one of the corners. Eor random regular graphs, the edges are oriented 
according to a random permutation. The vector of initial observations y is chosen to be a random permutation of 1 to 
n obeying the partial order, perturbed by adding i.i.d. Gaussian noise to each coordinate. Eor each graph size, and two 
different noise levels (standard deviation for the noise on each coordinate being 1 or 10), the experiment is repeated 
multiple time. The relative error in the objective was ascertained to be less than 1%. 
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3 Algorithms for ^^-norms, p < oo 


Without loss of generality, we assume y G [0,1]". Given 
p £ [1, oo), let p-ISO denote the following £p-norm Iso¬ 
tonic Regression problem, and OPTp-iso denote its opti¬ 
mum: 

mm lla: - y||P ^ . (3) 

Let denote the entry-wise p* power of w. We assume 
the minimum entry of is 1, and the maximum entry 
is tCmax < exp(n). We also assume the additive error 
parameter 5 is lower bounded by exp(—n), and that p < 
exp(n). We use the O notation to hide poly log log n 
factors. 

Theorem 3.1. Given a DAG G{V,E), a set of obser¬ 
vations y G [0,1]'^, weights w, and an error parame¬ 
ter 5 > 0, the algorithm ISOTONICiPM runs in time 
O log^ n log {'^P'^Lx/s)) , and with probability at least 1 


Running Time in Practice 



— i/n, outputs a vector xalg S Iq with 


||a:ALG — y\\^w,p — OPTp ISO + S. 


The algorithm ISOTONiCiPM is obtained by an appropriate instantiation of a general Interior Point Method (IPM) 
framework which we call ApproxIPM. 

To state the general IPM result, we need to introduce two important concepts. These concepts are defined formally 
in Supplementary Material Section IaTI The first concept is self-concordant barrier functions', we denote the class of 
these functions by SCB. A self-concordant barrier function / is a special convex function defined on some convex 
domain set S. The function approaches infinity at the boundary of S. We associate with each / a complexity parameter 
d{f) which measures how well-behaved / is. The second important concept is the symmetry of a point z w.r.t. S: 
A non-negative scalar quantity sym(z, S). A large symmetry value guarantees that a point is not too close to the 
boundary of the set. For our algorithms to work, we need a starting point whose symmetry is not too small. We later 
show that such a starting point can be constructed for the p-ISO problem. 

ApproxIPM is a primal path following IPM: Given a vector c, a domain D and a barrier function / £ SCB for 
D, we seek to compute (c, x) . To find a minimizer, we consider a function fc ,- f { x ) = f(x) -f 7 (c, x), and 

attempt to minimize fc,-y for changing values of 7 by alternately updating x and 7. As x approaches the boundary of 
D the f(x) term grows to infinity and with some care, we can use this to ensure we never move to a point x outside the 
feasible domain D. As we increase 7, the objective term (c, x) contributes more to Eventually, for large enough 
7, the objective value (c, x) of the current point x will be close to the optimum of the program. 

To stay near the optimum x for each new value of 7, we use a second-order method (Newton steps) to update x 
when 7 is changed. This means that we minimize a local quadratic approximation to our objective. This requires 
solving a linear system Hz = g, where g and H are the gradient and Hessian of / at x respectively. Solving 
this system to find z is the most computationally intensive aspect of the algorithm. Crucially we ensure that crude 
approximate solutions to the linear system suffices, allowing the algorithm to use fast approximate solvers for this 
step. ApproxIPM is described in detail in Supplementary Material Section IA.5I and in this section we prove the 
following theorem. 


Theorem 3.2. Given a convex bounded domain D C IR" and vector c £ M", consider the program 


min (c, x) . 


(4) 


Let OPT denote the optimum of the program. Let f £ SCB be a self-concordant barrier function for D. Given a 
initial point xq € D, a value upper bound K > sup{(c, x) : x G D}, a symmetry lower bound s < sym(xo, D), and 
an error parameter 0 < e < 1, the algorithm ApprOXIPM runs for 

7;p.v = 0(yA(7) log («(/)/-)) 
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iterations and returns a point Xapx, which satisfies < e. 

The algorithm requires 0{Tapx) multiplications of vectors by a matrix M{x) satisfying 9/io • H{x)~^ ^ M{x) ^ 
ii/io • H(x)~^, where H{x) is the Hessian of f at various points x € D specified by the algorithm. 

We now reformulate the p-lSO program to state a version which can be solved using the ApproxIPM framework. 
Consider points {x, t) € H” x M", and define a set 

Dg = {(cc, t) : for all u € V . |a:(u) — p(v)f — t(v) < 0} . 

To ensure boundedness, as required by ApproxIPM, we add the constraint {w^ ,t) < K. 

Definition 3.3. We define the domain Vk = {Ig x li-") C Dg H {{x, f) : < K} . 

The domain Vk is convex, and allows us to reformulate program Q with a linear objective: 

min {w^ ,t) such that {x, t) € Hk- (5) 

X,t 

Our next lemma determines a choice of K which suffices to ensure that programs (O and (|5]l have the same optimum. 
The lemma is proven in Supplementary Material Section [AA] 

Lemma 3.4. For all K > Hk is non-empty and bounded, and the optimum of program ([S]) is OPTp.iso- 

The following result shows that for program (|5]l we can compute a good starting point for the path following 
1PM efficiently. The algorithm GoodStart computes a starting point in linear time by running a topological sort 
on the vertices of the DAG G and assigning values to x according to the vertex order of the sort. Combined with 
an appropriate choice of t, this suffices to give a starting point with good symmetry. The algorithm GoodStart is 
specified in more detail in Supplementary Material Section lA~^ together with a proof of the following lemma. 

Lemma 3.5. The algorithm GOOdStart runs in time 0{m) and returns an initial point (xq, tg) that is feasible, and 
forK = 3nwP^^, satisfies sym((a:o, <o), > isnHtwf, ■ 

Combining standard results on self-concordant barrier functions with a barrier for p-norms developed by Hertog 
et al. Jill, we can show the following properties of a function Fk whose exact definition is given in Supplementary 
Material Section lA!^ 

Corollary 3.6. The function Fk is a self-concordant barrier for Fk and it has complexity parameter 9 {Fk) = 0{m). 
Its gradient is computable in 0{m) time, and an implicit representation of the Hessian Hpn can be computed in 
0{m) time as well. 

The key reason we can use ApproxIPM to give a fast algorithm for Isotonic Regression is that we develop an 
efficient solver for linear equations in the Hessian of Fk- The algorithm HessianSolve solves linear systems in 
Hessian matrices of the barrier function Fk- The Hessian is composed of a structured main component plus a rank 
one matrix. We develop a solver for the main component by doing a change of variables to simplify its structure, 
and then factoring the matrix by a block-wise -decomposition. We can solve straightforwardly in the L and 

Lfy, and we show that the D factor consists of blocks that are either diagonal or SDD, so we can solve in this factor 
approximately using a nearly-linear time SDD solver. The algorithm HessIANSOLVE is given in full in Supplementary 
Material Section lAJl along with a proof of the following result. 

Theorem 3.7. For any instance of program © given by some (G, y), at any point z € Fk, for any vector a, 
HesSIANSOLVE((G, y), z, p, a) returns a vector b = M a for a symmetric linear operator M satisfying ^/lo-Hp^^ {^)~^ 
M A Hp^{z)~^. The algorithm fails with probability < p,. HeSSIANSolve rani in time 0(m log nlog(l/p)). 

These are the ingredients we need to prove our main result on solving p-ISO. The algorithm ISOTONICIPM 
is simply ApproxIPM instantiated to solve program ©, with an appropriate choice of parameters. We state ISO- 
TONICIPM informally as Algorithm[T]below. ISOTONiCiPM is given in full as Algorithm|6]in Supplementary Material 
Section FA.SI 
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Proof of Theorem I3.lt IsoTONICIPM uses the symmetry lower bound s = ^ —p—> the value upper bound 

io?l piltmax 

K = SnWmax, and the error parameter e = -^ when calling ApproxIPM. By Corollary 13.61 the barrier function 
Fk used by IsOTONiCiPM has complexity parameter 0{Fk) < 0{m). By Lemma [331 the starting point {xo,to) 
computed by GOOdStart and used by IsOTONiCiPM is feasible and has symmetry sym(a:o, Vk) > 

io?L pil^max 

By Theorem 13.21 the point (Xapxifapx) output by ISOTONICIPM satisfies jt-opt where OPT is the 

optimum of program (|5]l, and K = Snwmax is the value used by ISOTONICIPM for the constraint {wP,t) < K, 
which is an upper bound on the supremum of objective values of feasible points of program (|5ll. By Lemma 13.41 
OPT = OPTp.,so. Hence, \\y - Xapx||^ < fapx) < OPT + eK = OPTp.iso + <5. 

Again, by Theorem l3.2l the number of calls to HessianSolve by ISOTONICIPM is bounded by 

0(T) < O log (e(f’^)/e.s)) < O (v^log {r^p^L./s)) . 

Each call to HessianSOLVE fails with probability < n~^. Thus, by a union bound, the probability that some 
call to HessianSolve fails is upper bounded by 0{-Jm\og{npwmm/5))/rt' = 0(l/n). The algorithm uses 
O (i/m log calls to HessianSolve that each take time 0{m log^ n), as p = n^. Thus the total running 

time is O {w}-^ log^ n log (”P™max/( 5 )). □ 


Algorithm 1: Sketch of Algorithm ISOTONICIPM 

1. Pick a starting point (a;, t) using the GOOdStart algorithm 

2. for r = 1, 2 

3. if r = 1 then 7 ^- l;c= — gradient of / at {x, t) 

4 . else 7 ^ 1; p ^ l/poly(n); c = (0, 

5. for j ^ 1,..., Cim®-® log m : 

6. pp ■ (1 + jC2m~^-^) 

7. Let H, g be the Hessian and gradient of fc,p at x 

8 . Call HessianSolve to compute z Ri H~^g 

9. Update x x — z 
10. Return x. 


4 Algorithms for ^oo and Strict Isotonic Regression 

We now reduce loo Isotonic Regression and Strict Isotonic Regression to the Lipschitz Learning problem, as defined 
in ll^ . Let G = {V,E, len) be any DAG with non-negative edge lengths len : E —>■ M>o, and y : V —^ M U {*} 
a partial labeling. We think of a partial labeling as a function that assigns real values to a subset of the vertex set V. 
We call such a pair (G, y) a partially-labeled DAG. For a complete labeling x : U —?► R, define the gradient on an 

edge {u,v) £ E due to x to be gradg[x](M,x) = max o| . If len(u, x) = 0, then gradQ[x](M, x) = 0 

unless x(m) > x(u), in which case it is defined as +c». Given a partially-labelled DAG (G, y), we say that a complete 
assignment x is an inf-minimizer if it extends y, and for all other complete assignments x' that extends y we have 

max gradQ[x](it, u) < max gradQ[x'](M, u). 

{u,v)^E {u,v)^E 

Note that when len = 0, then inax(^u,v)eE x) < c» if and only if x is isotonic on G. 

Suppose we are interested in Isotonic Regression on a DAG G{V,E) under H-Hj^ To reduce this problem to 
that of finding an inf-minimizer, we add some auxiliary nodes and edges to G. Let Vl , Vr be two copies of V. 
That is, for every vertex u £ V, add a vertex ur to Vr and a vertex ur to Vr. Let Er = {{ur,u)}u^v and 
Er = {(rt, We then let = i/tu(u) and len'(tt,u/j) = ^/w{u). All other edge lengths are set to 

0. Finally, let G' = (U U I 4 , U Vr, E U Er U Er, len'). The partial assignment y' takes real values only on the the 
vertices in Vr U Vr. For all u £ V, y'{uR) := y{u), y'{uR) := y{u) and y'{u) := +. (G', y') is our partially-labeled 
DAG. Observe that G' has n' = 3n vertices and m' = m + 2n edges. 
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Lemma 4.1. Given a DAG G{V, E), a set of observations y G and weights w, construct G' and y' as above. Let 

X be an inf-minimizer for the partially-labeled DAG {G', y'). Then, x \v is the Isotonic Regression ofy with respect 
to G under the norm || ■ |1^ ^ ■ 

Proof We note that since the vertices corresponding to V in {G',y') are connected to each other by zero length 
edges, max(„ grad(t[x](M, v) < oo iff x is isotonic on those edges. Since C? is a DAG, we know that there are 
isotonic labelings on G. When x is isotonic on vertices corresponding to V, gradient is zero on all the edges going 
in between vertices in V. Also, note that every vertex x corresponding to V in G' is attached to two auxiliary nodes 
XL G Vl, xr G Vr. We also have y'{xL) = y'{xR) = y{x). Thus, for any x that extends y and is Isotonic on G', the 
only non-zero entries in grad^ correspond to edges in Er and Er, and thus 

max grad= ma,xu;„ • \y{u) - x{u)\ = \\x - y\\^ ^ . 

{u,v)£E' uGV ’ 

Algorithm CompInfMin from |[^ is proved to compute the inf-minimizer, and is claimed to work for directed 
graphs (Section 5, |[32l). We exploit the fact that Dijkstra’s algorithm in COMPInfMin can be implemented in 
0{m) time on DAGs using a topological sorting of the vertices, giving a linear time algorithm for computing the inf- 
minimizer. Combining it with the reduction given by the lemma above, and observing that the size of G' is 0{m -I- n), 
we obtain Theorem ll.2l A complete description of the modihed CompInfMin is given in Section I bT 2] We remark 
that the solution to the £oo-Isotonic Regression that we obtain has been referred to as AVG foo Isotonic Regression in 
the literature Il20l . It is easy to modify the algorithm to compute the Max, Min £oo Isotonic Regressions. Details are 
given in SectionlBl 

For Strict Isotonic Regression, we define the lexicographic ordering. Given r G K™, let tt^ denote a permutation 
that sorts r in non-increasing order by absolute value, i.e., Vi G [m—1], |r(7rr(i))| > |r(7rr(i-f 1))|. Given two vectors 
r,s G K™, we write r ^lex s to indicate that r is smaller than s in the lexicographic ordering on sorted absolute 
values, i.e. 


3j G [m],\r{TTrU))\ < |s(7’'s(j))| and Vi G [j - 1], |r(7r^(i))| = |s(7r^(i))| 
or Vi G [m], |r(7rr(i))| = |s(7r^(i))| • 

Note that it is possible that r f.\ex s and s A|ex r while r s. It is a total relation: for every r and s at least one of 
r ^lex s or s A lex r is true. 

Given a partially-labelled DAG (G, y), we say that a complete assignment a; is a lex-minimizer if it extends y 
and for all other complete assignments x' that extend y we have gradQ[x] V|ex grad(t[x']. Stout iH] proves that 
computing the Strict Isotonic Regression is equivalent to finding an Isotonic x that minimizes Zu = Wu ■ (x„ — yu) 
in the lexicographic ordering. With the same reduction as above, it is immediate that this is equivalent to minimizing 
grad(j, in the lex-ordering. 

Lemma 4.2. Given a DAG G{V, E), a set of observations y G and weights w, construct G' and y' as above. Let 
X be the lex-minimizer for the partially-labeled DAG (G', y'). Then, x \v is the Strict Isotonic Regression ofy with 
respect to G with weights w. 

As for inf-minimization, we give a modification of the algorithm COMPLexMin from ll^ that computes the 
lex-minimizer in 0{mn) time. The algorithm is described in Section IbT 2] Combining this algorithm with the re¬ 
duction from Lemma I 472 I we can compute the Strict Isotonic Regression in 0{m'n') = 0{mn) time, thus proving 
Theorem ll.3l 

Acknowledgements. We thank Sabyasachi Chatterjee for introducing the problem to us, and Daniel Spielman for 
his advice, comments, and discussions. We also thank Quentin Stout for his suggestions on a previous version of the 
manuscript, and anonymous reviewers for their comments. 
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A IPM Definitions and Proofs 


A.l Definitions 

Given a positive definite n x n matrix A, we define the norm || -H^ by 

||a:||^ = 'Jx^Ax. 


Given a twice differentiable function / with domain Df, which has positive definite Hessian H{x) at some x € Df, 
we define 

\\y\\x = Mnix) > 

and when M is a matrix, let ||M|1^ denote the corresponding induced matrix norm. 

We let Bx{y, r) denote the open ball centered at y of radius r in the H-H^ norm. 

Again, suppose / is a twice differentiable convex function with Hessian H, defined on a domain Df. If for all 
X G Df we have 

Bx{x, 1) C Df, 

and for all y G Bx{x, 1) and aWvf^O we have 


1 - \\y-x\\x < 


< 


1 - lb - ■ 


then we say the function is self-concordant. We denote the set of self-concordant functions by SC. A key theorem 
about self-concordant functions is the following (Theorem 2.2.1 of Renegar 1351). 


Theorem A.l. Suppose f is a twice differentiable function with Hessian H, defined on a domain Df, and for all 
X G Df we have 

Bx{x, 1) C Df, 


Then f G SC iff 


Also f G SC iff 


I - H{x)-^Hiy)\l, ||/ - Hix)-^H{y)\\^ < 


If f G SC also satisfies sup^.^^)^ Il5a;(a^)llx < o®’ '^^at / is a self-concordant barrier function. Given any 

^(/) > sup^g£)j. ||pa:(a;)||^, we say 0(/) is a complexity parameter for/. We denote the set of self-concordant barrier 
functions by SCB. 

We need the following notion of symmetry. We state a definition that is equivalent to the definition used by Renegar 
(Section 2.3.4 of El). 

Definition A.2. Given a convex set S and a point x G S, the symmetry of x w.r.t. S is defined as 

sym(a::, 5*) = inf inf i f > 0 : a; -|- — - — G S 

zeds [ t 


A.2 A Barrier Function for Vk 

Hertog et al. (El proved the existence of self-concordant barrier functions for a class of domains including ones 
capable of expressing program ©. The exact statement we wish to employ can be found in lecture notes by Nemirovski 

El. 
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Theorem A.3. For every pair of variables {x,t) € and for every constant p > 1, a self-concordant barrier 

function f G SCB exists for the domain 

{(cc, t) G : \xf < t}. 

This barrier function is given by 

f{t,x) =-\og{e/P-x^)-2\ogf 
and has complexity parameter 0{f) < 4. 

We are now ready to introduce a number of barrier functions; 

= I ^ - log - (x(v) - y(v)f^ - 21ogf(z;) j + [ X! -log(a^(^) “ 

\vGV ) \{a,b)eE 

fxixf) = -\og{K- {wP,t)). 

FKix,t) = F{x,t) + fK{x,f). 

Proof of CoroUarv l3.6t To prove the corollary, we need the standard fact that — log a: is a self-concordant barrier for 
the domain a: > 0 with complexity parameter 1, as shown in Renegar’s section 2.3.1 ll35l . We also need standard results 
on composition of barrier functions (adding barriers and composition with an affine function), as given by Renegar’s 
Theorems 2.2.6, 2.2.7, 2.3.1, and 2.3.2 ll^ . Given these and Theorem lA.3l the corollary follows immediately. □ 



A.3 Fast Solver for Approximate Hessian Inverse 


Algorithm 2: Algorithm HessianSOLVE ((G, j/), (a:, f), /a, a); Given a p-ISO instance (G, y), a feasible point 
(a;, f) of program, a vector a, outputs vector b. 

2. TG- 1/50 

3. BLOCKSOLVE((G,t/),(a:,f),^, r) 

4. return RankOneMore (M, u, a) 


Algorithm 3: Algorithm BlockSOLVE ((G, y), (a;, t), p, r) 


1. Let r ^ i7^(a: © y). 

2. For each v G V, identify t{v, v) = t{v). 

3. Compute R, T and G as given by equations ®, (|7|i, and ®. 

4. S ^Q'^B{R-CT-^C'^)B^Q. 

5. Ms ^ SDDSolve(S', y,T). 

7 ^ 0 

[-Q'^GBT-i I ■ 

7. Return a procedure that given vector a returns vector 




Y-i 

0 


0 

Ms 


Za. 


Algorithm 4; Algorithm RankOneMore(M, it, a): Given a linear operator M, a vector it, and a vector a, 
outputs vector b. 
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1. w = Mu. 

2. z = Ma. 

3. Return 


b = z- 


T 

w a 

1 

I + M w 


We introduce an extended graph G = {V UVVJ E), which includes our original vertex set V, and a second 
vertex set 

V = {v : V €V} . 

We define an additional set of edges 

E = V &V} 

Given vectors t € IR^ and r G we define a function 

t) = [ XI “ ^og(t(ef^P - r(ef) - 2 log t(e) j + ( X “ log(’'(e))) • 

\eeB / \ee-E / 

We associate with G a matrix B known as the signed edge-vertex incidence matrix. B has rows indexed by the set 
V UV, and columns indexed by the set U i?. It is given by 

{ 1 if e = {a, b) G E U E for some b G V UV 

— 1 if e = {a, b) G E U E for some b gV fiV 
0 otherwise. 


Now, we define a vector x(By G IR^'^^by 


(a;© 2 /)(M) 


x{u) foruGV 

y{v) where v = u and u gV 


Note that 


E 


= \V\. Abusing notation, we identify the vector t G with the vector f £ by equating 


t{v, v) = t{v). We then get 

E{x, t) = h{B'^{x © y), t). 

We compute the Hessian Elh of h{r, t) in variables r and t. The Hessian can be expressed as a block matrix 


Hh = 


T G^ 
G R 


where T contains derivatives in two coordinates of t, while R contains derivatives in two coordinates in r, and G has 
the cross-terms. T and R are square diagonal matrices, and G is not generally square, but has non-zero entries on 
the principal diagonal. In fact, the only thing we will need about the explicit forms of these matrices is that they are 
efficiently computable. For completeness, we state them: 


Ti ^ 

\t{ef/P-r{ef 


- 1 


t{e 


L—2+2/p 


p) t[eY/P — r{eY t{eY 


where e G E 


and 


while 


R{e,e) = { (t(e)=/p-!'(e)0 + t(e)=/pX(e )2 fo^eGE 
l/r(e)^ for e G 


4 t(G^ ^ 


(7) 


(8) 


(9) 
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Finally, let Q denote the V UV 
only on the principal diagonal: 


X \V\ projection matrix which maps x to (x 0 0). It is a matrix with non-zeroes 


Q{v,v) 


1 foxv €V 
0 otherwise. 


To prove Theorem 13.71 we will need three results: The first is a theorem on fast SDD solvers proven by Koutis et 
al. BOI . 


Theorem A.4. Given annxn SDD matrix X with m non-zero entries, an error probability p, and an error parameter 
T, with probability > 1 — p the procedure SDDS0LVE(A, fi, r) returns an (implicitly represented) symmetric linear 
operator M satisfying 

(1 - t)X-^ + t)X-^. 

SDDS0LVE(X, /r, r) runs in time 0(m logn log(l//i) log(l/T)), and M can be applied to a vector in time 0(m logn log(l//i) log(l/r 
as well. 

Lemma A.5. Suppose X is a positive definite matrix, and r £ [0,1/5) is an error parameter, and we are given a 
symmetric linear operator M satisfying 


(1 - t)X-^ ^ M ^ (1 + r)A-\ 

and suppose we are given a vector u £ H". Then RankOneMORE(M, u, a) acts as a linear operator on a and 
returns a vector b = Zafor a symmetric matrix Z satisfying 

(1 — 5r)(A + uu^)~^ Z (1 5r)(A + uu^)~^. 

If M can be applied in time Tm, then RankOneMore runs in time 0 (Tm + n). 

Lemma A.6. For any instance of program 0 given by some (G, y), at any point z £ Dk, given an error probability 
p, and an error parameter t, with probability > 1 — /r the procedure BlOCKSOLVE(X, p, r) returns an (implicitly 
represented) symmetric linear operator M satisfying 

(1 — t)Hp{z)~^ M < (\-\- t)Hp{z)~^. 

BlOCKSolve(A, /i, r) runs in time 0{mlognlog(l/p) log(l/T)), and M can be applied to a vector in time 
0{m\ogn\og{l/p) log(l/r)) as well. 

We prove Lemmas lA.6l and lA.5l at the end of this section. 

Proof of Theorem 13.71 By Lemma p\.61 BlockSolve ((G, y), (x, t), p, 1/50) returns an implicitly represented 
linear operator M satisfying 


(l - ^) ^ ^ M (^1 +Hp{{x,t)) ^ 

This M satisfies the requirements of M in Lemma I a 3] with X = Hp{{x, t)) and t = 1/50. With u = 

where Hp{x, t) + uu^ = Hp^^ (x, t), we get that RankOneMore (M, u, a) returns a vector Za, for a symmetric 

matrix Z satisfying 

-^Hp^{x,t)-^ <Z < ^Hp^{x,t)-\ 

The total running time is 0{m logn log(l//r)), as the running time of BLOCKS OLVE dominates. The algorithms fails 
only if BlockSolve fails, which happens with probability < p. □ 
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Proof of Lemma rA.6t Note T is a diagonal matrix, so that its inverse can be computed in linear time. 
Using standard facts about the Hessian under function composition, we can express the Hessian of F as 


7 0 

'T 

C^' 

7 0 


T C'^B'^Q 

0 Q^B 

C 

R 

0 B'^Q 


Q^BC Q'^BRB'^Q. 


A block-wise LDU decomposition of Hp gives 


Hp = 


I 0 
Q'^BCT-^ I 



T 

o' 



0 

S 



I T-^C'^B'^Q 
0 / 


Where the matrix 


S' = Q'^BRB'^Q - Q^BCT-^C^B'^Q = Q^B{R - CT-^C^)B^Q 

is the Schur-complement of T in Hp. Now, R — CT~^C^ is the Schur-complement of T in H. A standard result about 
Schur complements states that H is positive definite if and only if both T and R — CT~^C^ are positive definite. We 
know that H is positive definite, and consequently R — CT~^C"^ is too. But R — CT~^C"^ is a diagonal matrix, 
and so every entry must be strictly positive. This implies that B{R — CT~^C^)B"^ is a Laplacian matrix. The 
matrix has 0{m) non-zero entries. Since S = Q^B{R — CT~^C'^)B'’"Q is a principal minor of a Laplacian matrix, 
it is positive dehnite and SDD. Because S is PD and SDD, by Theorem I A. 41 using SDDSOLVE we can compute an 
(implicitly represented) approximate inverse matrix Ms that satishes 


^ Ms ^{1 + t)S-^. ( 10 ) 

in time 0{m\ogn\og ^ log p). This call may fail with a probability < p. The matrix Ms can be applied in time 
d{m log n log i log i). 

A block-wise inversion of the Hessian gives 


We dehne 


h: 


7 -T-^C'^B^Q 

'T-i 

0 ■ 

/ O' 

0 I 

0 


-Q'^CBT-^ I 


M = 


7 -T-^C^B^Q 

'T-i 

0 ■ 


I o' 

0 I 

0 

Ms_ 


-Q^CBT-^ I 


( 11 ) 


( 12 ) 


By equations (fTOl i and (fTTT i. and the fact that for all matrices W, X implies WXW^ ^ WYW^, it follows that 

(1 - T)Hf^ ^M^ (1 + t)H^\ 


We observe that the output of BlockSolve ((G, y), {x, t) is a procedure which applies M. We require 
a constant number of linear time matrix operations (inversion of a diagonal matrix, multiplication of a vector with 
matrix), and one call to SDDSOLVE, which runs in time 0(m log n log ^ log i). This call dominates the running time 
of BlockSolve. The call to SDDSolve may fail with a probability < /i, and in that case BlockSolve also fails. 

□ 


Proof of Lemma [A.St From our assumptions about M and the computation in RankOneMore, it follows that 

b = Za. 


for some 


where r = | < 1/5 and 


Z = M - 


Muu^M 
1 + MvX ’ 


(1 - t)X-^ + t)X-^. 
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Thus, RankOneMore acts as a linear operator on a, and it is symmetric. Suppose Y = X -\- uu^, then by the 
Sherman-Morrison formula, 


=X-i- 






l + u^X-h 


We can restate the spectral inequalities for M as M = X ^ + E, for some symmetric matrix E with 

-rX-i <E < tX-\ 

We want to show that 


(l-(5)y-^ ^ (1+(5)F- 


where 6 = 5r. 

First observe that for any vector v, 


u'^Y-^v = v^X-^v- 


v^X ^uw^X 


X (v^X ^v){u^X ^u) — (jEX 


1 + u^X-h 


1 + u'^X 1 + u^X 

where in the latter expression, both terms are non-negative. Similarly 

rp rp v'^Muu^Mv v'^Mv [v^ Mv)iw^ Mu) — ivF MvY 

ii Zi) = ii JVfi^ —-=- 1 - 

l + u^XIu l + u^Mu 1 + u^XIu 

and again in the final expression, both terms are non-negative. We state two claims that help prove the main lemma. 

Claim A.7. 


1 


1 


1 + u'^Mu 1 + u'^X 


< 


1 


1 — T 1 -f u'^X 


Claim A.8. 


\{v^ X ^v){vL^ X — {[v^M v){vl^ Mu) — {v^ Mu)‘^)\ 

< 2(r -I- r^) X~^v){u^X~^u) — {v^X~^u)‘^) . 


We also make frequent use of the fact that 1 -f MMu > 1 -f (1 — t)M X > (1 — t)(1 -f X ^u). Thus 


Iv'^Zv -v^Y-^v\ < 


'j'^Mv — X 


1 + u^Mu 


+ v'^X-^v 


1 


1 


1 -f u^XIu 1 -f M^X-ir 


{v^Mv){u^Mu) — {u^Mv)"^ — {v^X ^v){u^X ^u) — {u^X 


1 -f u'^Mu 

+ {iv^X-^v){u^X-\) - (m'^X-^)2) 

2t 


1 


1 


< 


< 




1 — T 1 -f U^X ^U 
3t + 2t2 y , 


-f 


1 -I- u'^Mu 1 + u'^A~^u 
3t + 2r^ X~^v){u^ X~^u) — {uF X~^v)‘^ 


1 — r 


1 -f M^X-ii 


1 — T 


-v^Y-F. 


< M-v^Y-\. 


□ 
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Proof of Claim IAItI 


1 

1 


u^ Eu 

1 + Mu 

l + F^X-^u 


(1 + u'^Mu){l + u'^X~^u) 


^ 1 TU^X 

“ 1 + Mu 1 + u^X~^u 
T 1 

~ 1 — T 1 + X~^U 


Proof of Claim IaIsI Let 

V = av where vX~^v = 1, 
u = f5u where uX~"^u = 1. 

Also let u = "fv + where wX~^v = 0. Now 

1 = uX~^u = 7 ^ + (1 — 'y‘^)wX~^w, 


so wX = 1. Thus 

{v^X-\){u^X-\) - {v^X-\f = - 7^). 

And 


{v^Mv)(u^Mu) — v'^Mv{'yv + \/l — M{'jv + \/l — 7 ^)w)) 

— (^v'^XI{'yv + \/l — 7^?!))^ 


= a^/3^(l — 7 ^) [{v'^Mv){'w'^Mw) — (D’^Mw)^] 

= a^/3^(l — 7 ^) [(1 + v^Ev){l + ilFEw) — {iFEw)"^^ . 


Thus 


\{v'^ X~^v){u'^ X~^u) — {v^ X~^u)'^ — Mv){u'^ Mu) — {v^ Mu)'^)\ 

= a^/3^(l — 7 ^) |l — ((1 + v'^Ev){\ + uFEw) — {v'^Ew)'^) | 

= a^/3^(l — 7 ^) \v'^Ev + uFEw + {uFEw){'iFEv) — {v'^Ew)^ 

< — 7^)2(r + F)- 

To establish the final inequality, we used that || || < r, and hence 

|i)^i?r()| < T < T. 

Combined with Equation (fTSl) . this proves the claim. 


□ 


(13) 


□ 


A.4 Starting Point 

Algorithm 5: Algorithm GOOdStart: Given an instance (G, y), outputs feasible starting point (ccq, fo)- 

1. Use a linear time DFS to compute a topological sort on G to order vertices in a sequence (ui,..., u„), s.t. 
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for every edge {vi, Vj),i < j. 

2. for i 1,... ,n : 
xoivi) ^ ijn. 

3. for i ^ 1,..., n : 

fo(i’z) ^ ko('yi) - y{vi)\^ + 1. 

We prove the following claim, which in turn will help us prove Lemmasl3.4landl3.5r 

Claim A.9. Let (xq, to) be the point returned by GOOdStart. For every vertex v. 


0 < a;o(r') < 1. 

Proof. Follows immediately from the GoodStart algorithm. □ 

Proof of Lemma I3l4t First we consider another program minimizing a linear objective over a convex domain. 

min {w^,t) 

x.t 

s.t. (a:, t) S Dg n {Ig X M'^) 

Let OPTiin denote the optimal value of program (fl4li . The value OPTun is attained only when t{v) = |a;(u) — y{v)f 
for every vertex v, and when this holds, the program is exactly identical to program Q. Hence OPTun = OPTp.jso- 
Now observe that the point (xo,to) computed GoodStart is feasible for program (fl4l i. This is true because 
the topological sort ensures that for every edges (a, b), the indices ia and it, assigned to vertices a and b satisfy 
ia < ib and hence x{b) — x{a) = ^(ia — ib) > 0. Meanwhile, the assignment to{vi) = |a;o(ui) — y{vi)f + 1 
ensures that constraints on t are not violated. By Claim lA!9l (w;^,fo) < 2nWmax < K = ‘inwmax- Hence {xo,to) 
is also feasible for program Q. Thus, the domain Vk is non-empty, as (xo,fo) is contained in it. Let {x*,t*) 
be a feasible, optimal point for program (fT4l i. then clearly < {w'P,to) < K, so this point is feasible for 

program (|5]l, and thus OPTbnd < OPTiin = OPTp.jso • And, as program (fT4l i is a relaxation of program Q, it follows 
that OPTbnd > OPTiin = OPTp.iso- Thus OPTbnd = OPTp.iso- 

Finally, Dk is bounded, because for each vertex v, 0 < t{v) < K, and y{v) — < x{v) < y{v) + □ 

Proof of Lemma 13.51 Recall that 

sym(z, Vk) = inf inf ■[ s > 0 : z + ——— G Vk 

q&dT’K 1 S 


Hence for any norm H-H 


sym{p,VK) > 


infggo-p,^ l|g -pII 
sup^e9i?K \\P-P\\' 


We use a norm given by ||(a;,f)|| = ||a;||j^ + ||f|lo,n. By giving upper and lower bounds on the distance from (xojio) 
to the boundary of Vk in this norm, we can lower bound the symmetry of this point. 


max ||(x-a:o,f-fo)ll = , max ||x - + ||f - folL 

{t,x)£d'DK {t,x)£dT>K 

< 2 • + K < 6n<„. 

because for each vertex v, we have 0 < t{v) < K, and y{v) — K^^p < x{v) < y{v) + K^Ip. 

For every point {x, t) on the boundary of Vk, we lower bound the minimum distance to || {x — Xq, t — fo)|| by 
considering several conditions: 

1. The value constraint (1, t) < K is active, i.e. (1, t) = K. 

2. x{a) = x{b) for some edge (a, b) G E. 
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3. |a;(a) — y{a)'f = t{a) for some v G V. 

At least one of the above conditions must hold for {x,t) to be on the boundary of Vk- We will show that each 
condition individually is sufficient to lower bound the distance to (xq, to). 

ConditionU} (l,f) = K. Then 


{x-Xo,t-to)\\ > ||f-fo|loo ^ -p-^olli > - 


- Polli) >^iK-2n)> 


Condition^ x{a) = x{b) = 7 for some edge (a, h) G E. Then 


\\{x - Xo,t - to)\\ > llx-xolL 

> \ {\x{b) - a;o(6)| + |x(a) - a;o(a)|) 

= ^(l7-a;o(&)| + l7-a;o(a)|) 

> \ i\xo{b) - xo{a)\) > 

Condition^ \x{a) — y{a)\ = forsome a £ 17. Weconsidertwocases. First case is when ||f — folloo — 

This immediately implies ||(a: — ccq, t — fo)|| > 1/2. 

In the second case is when ||t — folloo write x{a) = xo{a) + A. 

|A + Xo{a) - y{a)f = t{a) > to{v) - \\t - folU 
> 1/2 + |a;o(a) - y{a)\^ 


As p > 1, the growth rate of |A + xo(a) — y{a)\^ is largest when |a;Q(a) — y{a)\ is maximized and as xq, y G [0,1], 
we get |a;o(a) — y{a)\ = 1, and hence | A| is minimized in this case. Thus ||A| +1|^ > 1/2 + 1 = 3/2. Consequently, 


|A|> 



1 = exp 


log(3/2) ' 

P 


- 1 > 


log(3/2) ^ 1 ^ 
P ~ 3p' 


Thus, 


sym{{xo,to),'DK) > 


min(l/(3p),l/(2n)) 

6n 


> 


1 

ISn'^pWmRX 


□ 


A.5 Primal Path Following IPM with Approximate Hessian Inverse 


Algorithm 6; Algorithm ISOTONICIPM: 

Run ApproxIPM with; 

Objective vector c = (0, wP) s.t. (0, wP)'^{x,t) = wP{v)t{v). 

Gradient function g = PEk- 

Hessian function M — HessianSOLVE with p = X/v?. 

Complexity parameter 0(/) = 9{Fk) = 0{m). 

Symmetry lower bound s = 2 ^ —r— • 

ioTT ptOmax 

Value upper bound K = 

Error parameter e = 

Starting point (xq, to) given by GoodStart(G, y). 
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APPROXIPM outputs (Xapxjiapx)- 
Return Xapx- 


Algorithm 7: Algorithm ApproxIPM: Given an objective vector c € IR", a gradient function g : IR" —IR", a 
Hessian function M : IR" x IR" —H", a complexity parameter 9{f), a feasible starting point xq, a symmetry 
lower bound s > 0, a value upper bound K > 0, and an error parameter e > 0, outputs a vector Xapx- 


1. X Xq- 

2. p^l. 


3. Ti^2Oy^log(3O0(/)(l + l/s)). 

4. for i ^ 1,..., Ti : 



6. - pg{xo)+g{x) 

1. X ^ X — M{x, z) 

8. a <r- c^M{x, c) 

9 - 5 ^ 

10. z ric + g{x) 

11. X X — M{x,z) 


12. T2^20y^log(M) 


13. for i ^ 1,..., T 2 : 



15. z ^ pc + g{x) 

16. X ^ X — M(x, z) 

17. return Xapx ^ x. 

in this section we prove l'heoreml3.2l We start by proving a central lemma shows that approximate Newton steps 

are sufficient to ensure convergence of our primal path following IPM. 

The rest of this section is a matter of connecting this statement with Renegar’s primal following machinery. 

Lemma A.IO. Assume f £ SC and is defined on a domain D. If 5 = ||i7(x)“^p(x)||j^^^^ Si t < 1, and 


(1 - t)H{x)-^ a M a (1 + r)i7(x)"\ 
then taking x+ = x — Mg{x) will ensure both that x+ G D and 



Proof. For brevity write = H(x). Firstly, 


\\x+-x\\h^ = |lMp(x)||^^ < (1 + t) ^g{x)\\^^ = (l + r)5 < 1, 


which guarantees feasibility of x+. Further, 


\\I-MH,\\\^= max (I - H,M)HM - MH,)y 
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Then 


^ 2 T TT 2 

< max T y HrV = t 
\\ v \\ h ^=1 


Now, 


\\H,-^g{x)-Mg{x)\\^^ = \\{I - g{x)\\^^ < \\I - \\H,-^g{x)\\^^ 

Hx~^g{x+) = H:^~^g{x) + f Ha;~^H{x + t{x+- x)){x+- x) dt 

Jo 

= {Hx~^g(x) — Mg{x)) + Mg(x) + / Hx~^H{x + t(x+ — x)) (x+ — x) dt 

Jo 

= iH^~^g{x) - Mg{x)) + [ [l - H^~^H{x + t{x+ - x))] Mg{x) dt 

Jo 


Thus, using Theorem lA. 1 1 


^g{x+)\\ <\\H^ ^g{x) - Mg{x)\\ + 


f [l — Hx ^H{x + t{x+— x))] Mg{x) dt 

Jo 

<T\\H^~^g{x)\\ + [ \\l-H^~^H{x + t{x+-x))\\ dt\\Mg{x)\\jj 

Jo 

< t (5 + (1 + t )5 [ \\l — Hx~^H(x + t(x+— x))\\„ dt 

Jo 

< t(5 + (1 + t)S j 

Jo 




< t(5 + 


0 (l-t(l + r)5)2 

((l + r)5)2 


— 1 di 


1 — (1 + t)S 

Finally, we can use the self-concordance of / to get 




< 


1 


1 — (1 + t )5 


tS ■ 


{0 + r)Sr 

1 — (1 + t)6 


For completeness, we now restate several results from a textbook by Renegar m- 


□ 


Definition A.ll. Consider a function / S SC with bounded domain Dj. Let Df be the closure of the domain. Given 
an objective vector c, we define the associated minimization problem as 


min (c, x) 

X 

subject to X £ D f, 

and, we define the associated g-minimization problem as 

min g{c,x)+f{x) 

X 

subject to X G Df. 

For each g, let z{g) G Df denote an optimum of the g-minimization problem. 


(15) 


( 16 ) 
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Using this definition, we can state two lemmas, which are proven by Renegar, and appear equations (2.13) and 
(2.14) in Iia. 

Lemma A.12. Given a function f £ SC with bounded domain Df and an objective vector c, let OPT denote the 
value of the associated minimization problem. Then for any ry > 0 and any x G Df 


Lemma A.13. Given a function f £ SCB with bounded domain Df and an objective vector c, let OPT denote the 
value of the associated minimization problem. Then for any ry > 0 and any x G Df 

(c, x) - OPT < ^0{f){l + llx - z(7y)IU(^)), 

where z{r]) is an optimum of the associated rj-minimization problem. 

The following is a restricted form of Renegar’s Theorem 2.2.5 ll35l . 

Lemma A.14. Assume f G SC. IfS = ||i/(a;)“^p(a;)||^ < 1/Afor some x G Df, then f has a minimizer z and 


Ik 


x\\x < ^ + 


3^2 

(1-^)3- 


The next lemma appears in Renegar ll^ as Proposition 2.3.7: 
Lemma A.15. Assume f G SCB. For all x,y G Df, 

||H(.)-<,W||, < (l + 


Proof of Theorem l3.2t Given a vector v, and 7 > 0, let fv,-^{x) = fix) +"f {v, x). Let 

riy^^ix) = Hix)~^ (gix) + jv) = pxix) + yHix)~^v. 

Now, for any 71 and 72 


,72 (^) - tly ^ j ^ ix ) -f f 1 I Qxix ). 




Thus 


7i 


^2 

\\ny,^^ix)\\^ < — + 

7i 


72 


7i 


72 

7i 


- 1 




Observe that for any 7 , the Hessian H{x) of / is also the Hessian of f.y. Consequently, we have f.y G SC because 


/ £ SCB. Thus by Lemma lA. 101 applied to the function fy^.y, if (5 = 


< ^, r < 1 , and 


■ y \-^ J \\ H { x ) - 2 
(1 - ^ M ^ (1 + t)H{x)-\ 

then for x+ = x — M (gix) + jv), we have x+ G Df„ ^ = Df and 

||n„.^(a:+)lk^ = \\Hix+)-\gix+) + 72t^)||^(,^) < ^ ^ 

Suppose we start with 

Iku.Tl ( 3 ^) llx — V9j 
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And take 


1 


72 



20 vW) 


71- 


Then using 9{f) > 1, we find 


||n„,72(a;)L ^ 1/6- 


For T = 1/10, letting x+ = x — M (g(x) + 72 ^), we get 


hv,jAx+)\\,^ = \\Hix+) i(g(x+) + 72t;)||^(^^) < (^1/60+^ii^^^ <1/9. 

Similarly, if we take 

then 

||n„.72(a;)L ^ 1 / 6 - 

So again, taking x+ = x — M {g{x) + 72 ^) gives 

\\nv,^Ax+)h^ = ||iT(x+)-i(g(x+) + 72t;)||^(^^) < ^ 

With these observations in mind, we are ready to prove the correctness of the ApproxIPM algorithm. 

We refer to the for loop in step|2]as phase 1 of the algorithm. In phase 1, we take vi = —g{xo), so 

riviA^) = H{x)~^ {g{x) - pg{xo)) ■ 

Initially, as a; = xq, so as p = 1, we ||n^j_p(a;)|| = 0 < 1/9. Thus, by our observations on decreasing 7 , we find 
that after each iteration of the for loop, we get ||rit,i,p(a;)||^ < 1/9, and after the i* iteration of the for loop, we get 

p< 1 ^ 1 - . When the for loop completes, we thus have 


^ X 20y?(7)log(30e(/)(l + l/s)) ^ 

20vW)j ■ 3O0(/)(l + l/s)- 

Hence, for the x obtained at the end of phase 1, by applying Lemma lA.lSI and our symmetry lower bound s, we get 

\\H{x)~^g{x)\\^ = ||piT(a;)“^ 5 (xo) +n„i,p(a;)||^ 

< p ||iF(a;)"^5(a;o)||^ + \\n^up[x)\\^ 

< p 6 »(/)(l + 1/s) + 1/9 < 1/30 + 1/9 = 13/90. 

We refer to steps fT2l and [Tbl as phase 2. In phase 2, we consider 

nc,T,ix) = H{x)~^ {g{x) + pc). 

Using y/c^Mc > ^H{x)~^c > ^ ||iT(a;)“^c||^ ., we get that at the start of step [ 12 ] 

IKc.7,(a;)|| = ||? 7 iT(x)"^c + iT(a;)"^ 5 (a;)||^ 

< 77 ||iT(a:)-ic||^ + ||iT(a:)-i5(a:)||^ 

< 4 + 13/90 = 1 / 6 . 


P< 1- 
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Hence, at the end of step[T 2 ] we get |lnc,,,(x)||^ < 1/9. Thus, at the end of each iteration of the for loop in stepfThl 
we also get \\ncA^)\l < 1/9. 


So once the loop completes, using V c^Mc < ^c||^, and that by Lemma lA.121 \H{x) ^ c ||^ < K — 

OPT, we have 



Nowfrom Uric,,,(x) 11^ < 1/9 and Lemma lATTl applied to fn ^. we get that l|a; — z(n)\\^ < l/9+3(l/9)^/(l—1/9)^ < 
1/6, and by the self-concordance of /, ||a; — z{r])\\^f^^^ < (l/ 6 )/(l — 1/6) = 1/5. Then by Lemma lA. 13 1 applied to 


/, we have 


(c, x) - OPT < ^(1 + ||x - z(p)L(,)) <e-{K- OPT). 


□ 


B Inf and Lex minimization on DAGs 

In this section, we show that given a partially labeled DAG (G, vq), we can hnd an inf-minimizer in Oim) time and a 
lex-minimizer in 0{mn) time. 

Notations and Convention. We assume that G = {V, E, len) is a DAG and the vertex set is denoted by C = 
{1, 2,..., n}. We further assume that the vertices are topologically sorted. A topological sorting of the vertices can 
be computed by a well-known algorithm in 0{m) time. This means that if (i, j) G E, then i < j. len : i? —?► ]R>o 
denotes non-negative edge lengths. For all x,y G V, by dist(x, y), we mean the length of the shortest directed path 
from X to y. It is set to oo when no such path exists. 

A path P in G is an ordered sequence of (distinct) vertices P = (xg, xi,..., Xk), such that (xi-i,Xi) € E for 
r € [k]. For notational convenience, we also refer to repeated pairs {x, x) as paths. The endpoints of P are denoted by 
dgP = Xo,diP = Xk- The set of interior vertices of P is dehned to be int(P) = {xi : 0 < i < k}. For 0 < i < j < 
k, we use the notation P[xi : Xj] to denote the subpath {xi ,..., Xj). The length of P is len(P) = xi). 

A function uq : 1^ —M U {*} is called a labeling (of G). A vertex x G V is a terminal with respect to vg iff 
vg{x) ^ *. The other vertices, for which vg{x) = *, are non-terminals. We let T{vg) denote the set of terminals with 
respect to vg. If T{vg) = V, we call vg a complete labeling (of G). We say that an assignment u : F —>■ K U {+} 
extends vg if v(x) = vg(x) for all x such that uo(a;) *. 

Given a labeling hq : 1^ —>■ M U {*}, and two terminals x,y G T{vg) for which {x, y) G E, we dehne the gradient 


on {x, y) due to vg to be 



Here and wherever applicable, we follow the convention 2 = Q, 0 • 00 = 0 and fimte^mber _ q \Yhen vg is a 
complete labeling, we interpret gradQ[uo] as a vector in K™, with one entry for each edge. 

A graph G along with a labeling v of G is called a partially-labeled graph, denoted (G, v). We say that a partially- 
labeled graph (G, uo) is a well-posed instance if for every vertex x G V, either there is a path from a; to a terminal 
t G T{vg) or there is a path from a terminal t G T(vg) to x. We note that instances arising from isotonic regression 
problem are well-posed instances and in fact satisfy a stronger condition. Every vertex lies on a terminal-terminal 
path. 

A path P in a partially-labeled graph (G, vg) is called a terminal path if both endpoints are terminals. We dehne 
V+P(uo) to be its gradient: 



If P contains no terminal-terminal edges (and hence, contains at least one non-terminal), it is a free terminal path. 
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Lex-Minimization. An instance of the Lex-Minimization problem is described by a partially-labeled graph 
{G,vo). The objective is to compute a complete labeling u : Vg K extending vq that lex-minimizes gradQ[u]. 
We refer to such a labeling as a lex-minimizer. Note that if T('(;o) = Vq, then trivially, vq is a lex-minimizer. 

Definition B.l. A steepest fixable path in an instance (G, vo) is a free terminal path P that has the largest gradient 
V^P(uo) amongst such paths. 

Observe that if P is a steepest fixable path with V”*'P(uo) > 0 then P must be a simple path. 

Definition B.2. Given a steepest fixable path P in an instance (G, uq), we define fixG['Uoi P] '■ —>■ R U {+} to be 

the labeling given by 


fixG[uo,P](a;) 


voidoP) - V+P(t;o) • lenG(P[9o-P : a;]) xG int(P) \T{vo), 
Vq (x) Otherwise. 


We say that the vertices x G int(P) are fixed by the operation fix[uo,P]. If we define vi = fixG[fo,T’], where 
P = (ccq, ... ,Xr) is the steepest fixable path in (G,vq), then it is easy to argue that for every i G [r], we have 
grad[ui](a;i_i,a;i) = V+P. 


B.l Sketch of the Algorithms 

We now sketch the ideas behind our algorithms and give precise statements of our results. A full description of all the 
algorithms is included in the appendix. 

We define the pressure of a vertex to be the gradient of the steepest terminal path through it; 

pressure[uo](a;) = max{V^P(t;o) | P is a terminal path in (G, vq) and x G P}. 

Observe that in a graph with no terminal-terminal edges, a free terminal path is a steepest fixable path iff its gradient 
is equal to the highest pressure amongst all vertices. Moreover, vertices that lie on steepest fixable paths are exactly 
the vertices with the highest pressure. For a given a > 0, in order to identify vertices with pressure exceeding a, we 
compute vectors vHigh[Q;](x) and vLow[a](a;) defined as follows in terms of dist, the metric on V induced by £: 

vLow[a](a;) = min {uoff)-f a • distfa;, 0} vHighfalfa;) = max {uo(f) — a ■ dist(f, a;)}. 
teT(Do) t£T{vo) 

Later in this section, we show how to find a steepest fixable path in expected time 0{m) for DAGs using the notion of 
pressure, and prove the following theorem about the SteepestPath algorithm (AlgorithmlQl). 

Theorem B.3. Given a well-posed instance (G, vq), SteepestPath(G, vq) returns a steepest terminal path in 0(jn) 
expected time. 

By repeatedly finding and fixing steepest fixable paths, we can compute a lex-minimizer. Theorem 3.3 in gives 
an algorithm Meta Lex that computes lex-minimizers given an algorithm for finding a steepest fixable path in (G, vq). 
Though the theorem is proven for undirected graphs, the same holds for directed graphs as long as the steepest path 
has gradient > 0. 

We state Theorem 5.2 from ll^ : 

Theorem B.4. Given a well-posed instance (G, vq) on a directed graph G, let vi be the partial voltage assignment 
extending vq obtained by repeatedly fixing steepest fixable (directed) paths P with VP > 0. Then, any lex-minimizer 
of (G, Vn) must extend V\ . Moreover, every v that extends v-i is a lex-minimizer of IG, Vn) if and only if for every edge 
eGEG\ {T{vi) X T{vi)), we have grad+[v]{e) = 0. 

When the gradient of the steepest fixable path is equal to 0, there may be more than one lex-minimizing assignment 
to the remaining non-terminals. But we can still label all the remaining vertices in 0(m) time by a two stage algorithm 
so that all the new gradients are zero, and thus by the above theorem we get a lex-minimizer. 
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Lemma B.5. Given a well-posed instance {G,vo), with T{vo) ^ Vq whose steepest fixable path has gradient 0, 
Algorithm AssignWithZeroGradient(G, wq) runs in time 0(m) and returns a complete labeling v that extends vq and 
has grad^[z;](e) = 0 for every e € Eq \ (T(vo) x T{vo)). 

Proof. Consider a well-posed instance (G, v^f)^ with T{vq) f- Vq whose steepest fixable path has gradient 0. In the 
first stage, AssignWithZeroGradientlabels all the vertices x such that there is a path from some terminal t G T to x. We 
label X with the label of the highest labeled terminal from which there is a path to x. This is the least possible label we 
can assign to x in order to not create any positive gradient edges. If this procedure creates any positive gradient edges, 
then it would imply that the the steepest path gradient was not 0 to begin with, which we know is false. Hence, this 
creates only 0 gradient edges. The steepest fixable path has zero gradient since after stage one, none of the unlabeled 
vertices lie on a terminal-terminal path. In the second stage, we label all the remaining vertices. An unlabeled vertex 
X is now labeled with the label of the least labeled terminal to which there is a path from x. It is again easy to see that 
this does not create any edges with positive gradient. The routine AssignWithZeroGradient (Algorithm flSTl achieves 
this in 0{m) time. □ 

On the basis of these results, we can prove the correctness and running time bounds for the COMPLexMin 
algorithm (Algorithm^ for computing a lex-minimizer. 

Theorem B.6. Given a well-posed instance {G,vo), COMPLexMin(G, rio) outputs a lex-minimizer whose steepest 
fixable path has gradient 0, v of {G, Vq). The algorithm runs in expected time 0{mn). 

B.1.1 Lex-minimization on Star Graphs 

We first consider the problem of computing the lex-minimizer on a star graph in which every vertex but the center is a 
terminal. This special case is a subroutine in the general algorithm, and also motivates some of our techniques. 

Let X be the center vertex, T = LLi Rhe the set of terminals, and all edges be of the form (x, t)iftGR and {t, x) 
if t G L. The initial labeling is given by u : T —R, and we abbreviate dist(x,f) by d{t) = len(x,f) if t G R and 
dist(f, x) by d(f) = len(f, x)iftG L. 

From Theorem IB .41 we know that we can determine the value of the lex minimizer at x by finding a steepest 
fixable path. By definition, we need to find ti G L,t 2 & R that maximize the gradient of the path from G to ( 2 , 

V+(G,G) = max I d(t 2 )+d(t 2 j ’ observed above, this is equivalent to finding a terminal with the highest 

pressure. We now present a simple randomized algorithm for this problem that runs in expected linear time. 

Theorem B.7. Given a pair of terminal sets {L, R), an initial labeling u : (L U i?) —>■ M, and distances d : L U i? —>■ 
R>o, StarSteepestPath(T, v, d) returns (( 1 ,( 2 ) with ti G L,t 2 G R maximizing , and runs in expected 

time 0{\L U R\). 

Proof The algorithm is described in Algorithm [TTl (named StarSteepestPath). Given a terminal ti G L (or 
G S R), we can compute its pressure a along with the terminal t 2 such that either G) = ain time 0(|r|) by 

scanning over the terminals in R (or terminals in L). Now sample a random terminal ti G L, and a random terminal 
G G R. Let «! be the pressure of G and 012 be the pressure of t 2 , and set a = maxjQfi, a 2 }- We will show that in 
linear time one can then find the subset of terminals T' = L' U R' such that L' C L,R' C R whose pressure is greater 
than OL. Assuming this, we complete the analysis of the algorithm. If L' = 0 (or R! = 0), G (or ( 2 ) is a vertex with 
highest pressure. Hence the path from G to G (or G to G) is a steepest fixable path, and we return (or (G, G))- 

If neither L' f % nor R' f % the terminal with the highest pressure must be in T', and we recurse by picking a new 
random ti G L' and G G R' ■ As the size of T' will halve in expectation at each iteration, the expected time of the 
algorithm on the star is 0(|r|). 

To determine which terminals have pressure exceeding a, we observe that the condition 3^2 G R . a < V+(G,G) = 
d(fi)+d(t 2 j ’ equivalent to 3G G R : u(G) + Q!d(f 2 ) < 'u(fi) — ad(G)- This, in turn, is equivalent to vLow[a](x) < 
v{ti) — Q!d(G)- We can compute vLow[q;](x) in deterministic 0(|T|) time. Similarly, we can check if 3G G L : a < 
V+(G, G) by checking if vHigh[a](x) > t;(G) + crd(fi). Thus, in linear time, we can compute the set T' of terminals 
with pressure exceeding a. □ 
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B.1.2 Lex-minimization on General Graphs 

In this section we describe and prove the correctness of the algorithm SteepestPath which finds the steepest fixable 
path in (G, vo) in 0{m) expected time. 

Theorem B.8. For a well-posed instance (G,vo) and a gradient value a > 0, ModDukstra computes in time 
0(m) a complete labeling v and an array parent : V ^ V U {null} such that, Vx € Vq, v ( x ) = minjg7’('^,u){z;o(f) + 
adist(t, x)}. Moreover, thepointerarray parent satisJiesVx (ji T (vq) such that parent{x) ^ null, vix) = t; (pa rent (a;))+ 
a ■ len(parent(x), x). 

As in the algorithm for the star graph, we need to identify the vertices whose pressure exceeds a given a. For a 
fixed a, we can compute vLow[a](x) and vHigh[a](x) for all x & Vq using topological ordering in 0{m) time. We 
describe the algorithms COMPVHigh, COMPVLow for these tasks in Algorithms 1^ and fTOl 

Corollary B.9. For a well-posed instance (G, uq) and a gradient value a > 0, let (vLow[a], LParent) t— COMPVLow(G, vo,a) 
and (vHigh[a], HParent) t— CompVHigh(G, uq, a). Then, vLow[a], vHigh[Q!] are complete labeling of G such that, 

Vx e Vg, 


vLow[a](x) = min {xo(f) + a ■ dist(x, f)} vHigh[Q;](x) = max {xQ(f) — a • dist(f, x)}. 

teT(Do) teT(«o) 

Moreover, the pointer arrays LParent, HParent saf/s/y Vx ^ T{vo), LParent(x), HParent(x) 7^ null and 

vLow[a](x) = vLow[a](LParent(x)) + a ■ dist(x, LParent(x)), 
vHigh[a](x) = vHigh[a](HParent(x)) — a ■ dist(HParent(x),x). 


The following lemma encapsulates the usefulness of vLow and vHigh. 

Lemma B.IO. For every x G Vc, pressure[z;o](x) > a vHigh[a](x) > vLow[a](x). 


Proof of Lemma IB.101 


vHigh[Q:](x) > vLow[q;](x) 


is equivalent to 

max {z;o(f) — a • dist(f, x)} > min {vo(t)-h a ■ dist(x,t)}, 
teT(vo) teT(vo) 

which implies that there exists terminals s,t G T{vq) such that 


vo{t) — a ■ dist(f, x) > 'Uo(s) + a • dist(x, s) 


thus. 


pressure[z;o](x) > 


vpit) - t;o(s) 
dist(f, x) + dist(x, s) 


> a. 


So the inequality on vHigh andvLow implies that pressure is strictly greater than a. On the other hand, if pressure[xo](x) > 
a, there exists terminals s,t G T{vo) such that 


Vo{t) - Ms) 
dist(<, x) + dist(x, s) 


pressure[?;o](x) > a. 


Hence, 


vo{t) — a ■ dist(f, x) > '!;o(s) + a • dist(x, s) 


which implies vHigh [a] (x) > vLow[a](x). 


□ 


It immediately follows from Lemma IB. 101 and Corollary IB. 9 1 that the algorithm COMPHighPressGraph described 
in Algorithm fT2l computes the vertex induced subgraph on the vertex set {x G Vg\ pressure[z;o](x) > a}, which 
proves the corollary stated below. 
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Corollary B.ll. For a well-posed instance {G, Vq) and a gradient value a > 0, COMPHiGHPressGraph(G', Vq, a) 
outputs a minimal induced subgraph G' ofG where every vertex x has pressure[iio](x) > a. 

We now describe an algorithm VertexSteepestPath that finds a terminal path P through any vertex x such 
that V+P(uo) = pressure[z;o](x) in expected 0{m) time. 

Theorem B.12. Given a well-posed instance {G, vq), anda vertex x € Vq, VertexSteepestPath(G, vq, x) returns 
a terminal path P through x such that V^P(t;o) = pressure[uo](a:) in 0{m) expected time. 

We can combine these algorithms into an algorithm SteepestPath that finds the steepest fixable path in (G, uq) 
in 0{m) expected time. We may assume that there are no terminal-terminal edges in G. We sample an edge (xi, X 2 ) 
uniformly at random from Eq, and a terminal X 3 uniformly at random from Vq- For i = 1,2, 3, we compute the 
steepest terminal path Pi containing Xi. By Theorem IB. 121 this can be done in Oim) expected time. Let a be 
the largest gradient maxi V’*'Pi. As mentioned above, we can identify G', the induced subgraph on vertices x with 
pressure exceeding a, in 0{m) time. If G' is empty, we know that the path Pi with largest gradient is a steepest 
fixable path. If not, a steepest fixable path in (G, vq) must be in G', and hence we can recurse on G'. Since we picked 
a uniformly random edge, and a uniformly random vertex, the expected size of G' is at most half that of G. Thus, we 
obtain an expected running time of 0{m). This algorithm is described in detail in Algorithm fT3] 

B.1.3 Linear-time Algorithm for Inf-minimization 

Given the algorithms in the previous section, it is straightforward to construct an infinity minimizes Let a* be the 
gradient of the steepest terminal path. From Lemma 3.5 in ll^ , we know that the norm of the inf minimizer is a*. 
Considering all trivial terminal paths (terminal-terminal edges), and using SteepestPath, we can compute a* in 
randomized 0{m) time. It is well known that vi = vLow[a*] and V 2 = vHigh[a*] are inf-minimizers. One 

slight issue occurs when a vertex x does not lie on a terminal-terminal path. In such a case, one of vLow[a*](x) or 
vLow[a*](a;) will not be finite. But the routine Assign WithZeroGradient described earlier can be used to fix the values 
of such vertices. It is also known that F(ui -I- ^ 2 ) is the inf-minimizer that minimizes the maximum .foo-norm distance 
to all inf-minimizers. For completeness, the algorithm is presented as Algorithm[TT] and we have the following result. 

Theorem B.13. Given a well-posed instance (G, uq), CompIneMin(G, uq) returns a complete labeling v of G ex¬ 
tending Vq that minimizes ||grad^[u] ||^ , and runs in 0{m) expected time. 

B.2 Algorithms 


Algorithm 8 : ModDijkstra(G, uq, a): Given a well-posed instance (G, vq ), a gradient value a > 0, outputs 
a complete labeling v of G, and an array parent : V V U {null}. 

1. for i = 1 to n 

2. if vo{i) f * then set v{i) = -boo else set v{i) = vo{i) 

3 . parent(i) null. 

4. for i = 1 to n 

5. tor j > i : {i,j) € Eg 

6 . ifu(j) > v{i) -b a ■ len(i, j) 

7. Decrease v(j) to v{i) -b a ■ len(i,}). 

8 . parent(j) t— i. 

9. return (v, parent) 


Algorithm 9: Algorithm COMPVLow(G, uq, a): Given a well-posed instance (G, uq), a gradient value a > 0, 
outputs vLow, a complete labeling for G, and an array LParent : L —t L U {null}. 
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1. (vLow, LParent) ^ ModDijkstra(G, -yo, a) 

2. return (vLow, LParent) 


Algorithm 10: Algorithm CompVHigh(G, vo,a): Given a well-posed instance (G, z;o), a gradient value a > 0, 
outputs vHigh, a complete labeling for G, and an array HParent : V ^ V U {null}. 

1. Let Gi denote the graph G with all edges reversed in direction. 

2. for X gVg 

3. if X G T(uo) then t;i(x) < -'yo(x) else ui(x) ^ vi{x). 

4. (temp, HParent) ^ ModDijkstra(Gi, ui,a) 

5. forx G Vgi : vHigh(x) ^-temp(x) 

6 . return (vHigh, HParent) 


Algorithm 11: Algorithm COMPInfMin(G, uq): Given a well-posed instance (G,vo), outputs a complete la¬ 
beling V for G, extending vg that minimizes ||grad’'’ M || ^. 

1. a ^ max{grad''‘[t;o](e) | e G i?G H (T(vo) x r(uo))}- 

2. Eg ^ Eg \ (T(uo) x T{vo)) 

3. P ^SteepestPath(G, wq)- 

4. a ^ max{a, V+P(uo)} 

5. (vLow, LParent) ^ CompVLow(G, vo,a) 

6 . (vHigh, HParent) ^ CompVHigh(G, vo,a) 

7. for X gVg 

8. if X G T{vo) 

9. then v{x) ^ vo(x) 

10. else if {vLow(x), vHigh(x)} fl (oo, —oo} = 0 then v(x) ^ ^ ■ (vLow(x) + vHigh(x)). 

11. else v(x) <— * 

12. V <— AssignWithZeroGradient(G, y) 

13. return y 


Algorithm 12: Algorithm COMPHighPressGraph(G, vq, cr): Given a well-posed instance (G, yo), a gradient 
value a > 0, outputs a minimal induced subgraph G' of G where every vertex has pressure[yo](-) > a. 

1. (vLow, LParent) ^ CompVLow(G, vq, a) 

2. (vHigh, HParent) ^ CompVHigh(G, yo, ct) 

3. Vg' ^ {x G Vg I vHigh(x) > vLow(x) } 

4. Eg' ^ {(x, y) &EG\x,y & Vg'}- 

5. G' ^ (L',L;',len) 

6. return G' 


Algorithm 13: Algorithm SteepestPath(G, yp): Given a well-posed instance (G,yo), with T{vg) ^ Vg, 
outputs a steepest free terminal path P in (G, vg). 

1. Sample uniformly random e G Eg - Let e = (xi, X 2 ). 

2. Sample uniformly random X 3 G Vg. 

3. for * = 1 to 3 

4. P -ir- VertexSteepestPath(G, uq, Xi) 
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5. Let j e argmaxjg{i _2 3 | V+Pj(t;o) 

6 . G' ^ CompHighPressGraph(G,wo, V+Pj(z)o)) 

7. if Eg' = 0, 

8. then return Pj 

9. else return SteepestPath(G', vo\v^,) 


Algorithm 14: Algorithm COMPLexMin(G, vq): Given a well-posed instance (G, t;o), outputs a lex-minimizer 
V of (G,uo)- 

1 . whaeT(uo) ^ Vg 

2. Eg ^ Eg\ (T(vo) x T(vo)) 

3. P ^ SteepestPath(G,wo) 

4. ifV+P = Othen Vo <— AssignWithZeroGradient(G, t;o) 

5. else uq ^ fix[r;o, ^’] 

6. return uo 


Algorithm 15: Algorithm AssignWithZeroGradient(G, tio): Given a well-posed instance (G, tio), with T(wo) 7 ^ 
Vg, outputs a complete labeling vq. 

1 . T^T{vo) 

2. for i = 1 to n : vo{i) ^ * 

3. tor j > i ■. {i,j) & Eg 

4. if vo(j) < vo(i) or vo(j) = * 

5- foO') ^ t'o(«) 

6 . T ^ T(vo) 

7. fori = nto 1 : vo(i) ^ * 

8 . for j <i: {j, i) € Eg and j 

9. if vo{j) > vo(i) or vo(j) = * 

10 . vo(j) vo(i) 

11 . return vq 


Algorithm 16: Algorithm VertexSteepestPath(G, vq, x): Given a well-posed instance (G, uq), and a vertex 
X G Vg, outputs a steepest terminal path in (G, uq) through x. 


1. Let L := {i G T{vo)\ there is a path from i to x} and R := {i G 7’(t^o)| there is a path from x to i} 

2 . if L = 0 or P = 0 then return {x, x) 

3. Compute dist(i, x) for all f S P and dist(a:, t) for a\\t G R 

4. if X G T(vo) 

s 7/, "o(^)-^o(v) . „n^ar(TTTiaY 

3 . t/i ^ argmaXyg^ ciist(a;.y) ’ 2/2 ^ argmaXj^gi dist(y,x) 


jf «o(x)-«o(yi 
dist(a:,yi) 


> 


'»o{y2)-Vo{x) 


“ dist(a:,yi) — dist(i/2,a:) 

7. then return a shortest path from x to yi 

8 . else return a shortest path from j /2 to x 


9. else 

10. for t G PUP, 

11. if f G P then d(f) ^ dist(i, x) else d(i:) dist(x, t) 

12. (fi, ^2) ^ StarSteepestPath(P, P, vqIlur, d) 
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13. Let Pi be a shortest path from ti to x. Let P 2 be a shortest path from a; to ^ 2 - 

14. P ^ {Pi,P 2 ). return P. 


Algorithm 17: StarSteepestPath(L, i?, v, d): Returns the steepest path in a star graph, with a single non¬ 
terminal connected to terminals in T, with lengths given by d, and labels given by v. 


1. Sample ti uniformly and randomly from L and t 2 uniformly and randomly from R 

2. Compute U £ argmax^g^ d(ti)+d(t) ^4 G argmax^g^ d(t^7+d(tj 

^ n ' TTi.nx f "(*i)-'»(*3) v{t4.)-v(t2) \ 

J. a ^ max d(ti)+d(t 3 ) ’ d{t 4 )+d{t 2 ) j 

4. Compute U|ow ^ minjg/{(z;(f) + a ■ d(t)) 

5. L' <- {t € L I v(t) > viow + OL ■ d(f)} 

6. Compute Uhigh ^ maxtgi(u(f) — a ■ d(f)) 

7. i?' ^ {f G i? I v{t) < Uhigh — a ■ d(f)} 

8. if L' U i?' = 0 then return {ti, ^ 2 ) 

9. else return StarSteepestPath(L', R',v\l'ur', dL'uR>) 
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